Centre-Based Hard and Soft Clustering Approaches for Y-STR Data

نویسندگان

  • Ali Seman
  • Zainab Abu Bakar
چکیده

This paper presents Centre-based clustering approaches for clustering Y-STR data. The main goal is to investigate and observe the performance of the fundamental clustering approaches when partitioning Y-STR data. Two fundamental Centre-based hard clustering approaches, k-Means and k-Modes algorithms, and two fundamental Centre-based soft clustering approaches, fuzzy k-Means and fuzzy k-Modes algorithms were chosen for evaluation of Y-STR haplogroup and Y-STR Surname datasets. The results show that the soft k-Means clustering algorithm produces the best average of the clustering accuracy (99.62%) for Y-STR haplogroup data as well Y-STR surname data (97.61%). The overall results show that the soft clustering approach is better (92.11%) than the hard clustering approach (81.20%) in clustering Y-STR data. However, the approach for clustering Y-STR data should be further investigated to find the best way of achieving 100% of the clustering results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Centre-based Hard Clustering Algorithms for Y-str Data

This paper presents Centre-based hard clustering approaches for clustering Y-STR data. Two classical partitioning techniques: Centroid-based partitioning technique and Representative object-based partitioning technique are evaluated. The k-Means and the k-Modes algorithms are the fundamental algorithms for the centroid-based partitioning technique, whereas the k-Medoids is a representative obje...

متن کامل

Generating Optimal Timetabling for Lecturers using Hybrid Fuzzy and Clustering Algorithms

UCTTP is a NP-hard problem, which must be performed for each semester frequently. The major technique in the presented approach would be analyzing data to resolve uncertainties of lecturers’ preferences and constraints within a department in order to obtain a ranking for each lecturer based on their requirements within a department where it is attempted to increase their satisfaction and develo...

متن کامل

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Automatic Semantic Classification of German Preposition Types: Comparing Hard and Soft Clustering Approaches across Features

This paper addresses an automatic classification of preposition types in German, comparing hard and soft clustering approaches and various windowand syntax-based co-occurrence features. We show that (i) the semantically most salient preposition features (i.e., subcategorised nouns) are the most successful, and that (ii) soft clustering approaches are required for the task but reveal quite diffe...

متن کامل

Performance Comparison of Hard and Soft Approaches for Document Clustering

There is a tremendous spread in the amount of information on the largest shared information source like search engine. Fast and standards quality document clustering algorithms play an important role in helping users effectively towards vertical search engine, World Wide Web, summarizing & organizing information. Recent surveys have shown that partitional clustering algorithms are more suitable...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010